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1. Introduction 


There are many numerical procedures for calculating the maximum likelihood 
estimates for loglinear models of frequency data. The most popular methods are 
the Iterative Proportional Fitting Procedure (IPFP) and variants of Newton's 
method. For problems involving a large number of parameters Newton's method is 
often impractical. On the other hand many models can not be expressed in a 
form which allows the simple IPFP to be applied. In these circumstances some 
other nonlinear optimization technique (e.g. the Generalized Iterative Scaling 
method of Darroch and Ratcliff (1972) or the extensions of the IPFP due to 
Haberman (1974)) must be used. As the basic IPFP is a well understood, robust, 
and widely available algorithm it would often be desirable to cajole a given 
problem into a form where the IPFP can be applied. We present a general 
theorem on transforming contingency tables and several applications where the 
transformation technique has allowed us to take advantage of the IPFP and 
resulted in simple and useful procedures. A further advantage of this tech¬ 
nique is that it is sometimes possible to recognize closed-form estimates in 
the transformed problem while they would be overlooked in the original setting. 

We shall view the estimation problem as one of minimizing the Kullback- 
Leibler information distance between two probability mass functions (p.m.f.'s) 
and will roughly follow the notation of Csiszar (1976). Although we have 
adopted the information distance point of view, the duality between maximum 
likelihood estimation and minimum information estimation (see e.g. Darroch and 
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2. Background and Notation 

Csiszar (1976) presents a very elegant discussion of the IPFP by developing 
a "geometry" for the information measure. A simplified version of the chief 
results of this theory are outlined below. Let n, p, q, r, s, and t denote 
p.m.f.'s which are non-zero for all elements of a finite set I. The Kullback- 
Leibler information number (or directed divergence) specifies a distance, 

i(p 1 i q) “ TtT z i n (p(i)/q(D) 

11 id 

between p and q. The principle of minimum discriminant information, as form¬ 
ulated by Kullback (1959), aims to minimize the distance between a reference 
distribution, q above, and a family of other distributions. The properties of 
such estimates have been studied extensively. The most important results can 
be found in Kullback (1959) and are summarized, with a special emphasis on con¬ 
tingency tables, in Gokhale and Kullback (1978). 

We next develop an appropriate family,E , of p.m.f.'s. A convex set,E , 
of p.m.f.'s is called linear if when p and q are in E and t * a • p + (1-a) • q 
(a e 1R) is a p.m.f., then t is also in E. A p.ra.f. which satisfies 

I(q||r) = min I (p||r ) 
peE 

is called the I-projection of r on E and will be denoted by q =]Pg(r). 

Csiszar gives conditions under which TP^(r) exists (it is always unique) and 
develops a geometry for I-projections by using an analogue of Pythagorous' 
Theorem. Now let F - (f : yer) be a set of real valued functions on 1 and 
A » (a^: yeT} be real constants. Define Mj. to be span (F). A linear set,E, 
can be constructed by considering the set of p for which, 

I p(i) • f(i) * a ; yer 

id y y 


i 
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When we consider s to be an observed probability function and 

a - I s(i) f (i) : yer 

' iel ' 

then the duality between maximum likelihood and minimum discriminant 
estimation states that if 

q * 3P E (r) 

then 

/\ 

ln(q) e Mp + ln(r) 


and 


e Mi 


i.e. q is the m.l.e. (under Poisson sampling) for the corresponding log- 

affine model. Csiszar's principle theorem says that if E is the finite 

intersection of the linear sets E, (i.e. E = O E ) then q = F p (r) 

keK 

is the pointwise limit of (q ) n * 1,2,3 where q^ » r and 

n 

E = E if i =* n mod l KI 
n i ' 

Example 1 . Ordered Categories 

Let p be an observed 3x3 probability function obtained via multinomial 
sampling and consider the ordered categories model 

E(p tj ) - , 


and 1n(q^j+ 8^ + j»Y i + i*6j 


; i,j - 1,2,3 . 


The linear manifold for this model is spanned by a set of 








4 


tables, f*, f* R , f^ and f^,; i,j * 1,2,3. The subscripts R, OR, C and 
OC indicate that the vector corresponds to Row, Ordered Row, £olumn or 
Ordered Column parts of the model, while the superscript indicates the 
row or column number, e.g., 



The general structure is that (or f 3 ) is a table of zeros except 
for the i'th row (j'th column) which contains ones, i.e., 


f*(k, 4) 


' 1 k = i 

. 0 k 4 i . 


Similarly, for the ordered row and column tables, the general form is 



(k, 2.) 


| k-1 1 - j 

i o Mi 


We now group the spanning tables into sets of related constraints. Let 
F R ■ tf R’ 4 ! 1 ■ 1 ’ 2 ' 3) 

and 

F c ■ {f c’ f 0C : 3 1 ’ 2 ’ 3 } • 

The sets of constants, A R and A^, are determined by the inner products 
of p with the spanning vectors. 
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The linear spaces of p.m.f.'s corresponding to these constraints and 
constants are: 


{p.m.f.'s p s.t. 


I f^(k,£)*p(k,£) 
k,£ A 


A = R,OR; i - 1,2,3} 


E c * {p.m.f.'s p s.t. I f^(k,£)*p(k,£) = aj* ; 

k,£ 

B = C,0C; j = 1,2,3} 

In order to find the M.L.E.'s of cell probabilities for this model we 

A 

need to be able to compute q - IPg(r) for r(k,£) = l,Vk , l and E = 

E R n E^ . The theory tells us that this I-projection can be obtained by 
cyclically projecting onto E R and E c . 

■ 









Motivation for Transformations 


As algorithms for the basic IPFP are widely available, it is often 
advantageous for us to be able to pose a problem in a way that makes it 
amenable to attack by means of these programs. 

A very simple example, which is prototypical of those that will 
arise in our later discussion, can be constructed as follows. 


Example 2 

Consider a triple of observed counts 2 = (z^, z^) from 3 

independent Poisson random variables with mean m = (m^, m^, m^) and 
having observed values (1, 3, 5). Suppose we wish to fit the log- 
affine model, 

I 1 \ 


In (in) e In 


where 


M = 


span 


ll + “ 
■) 


O' 

1 

21 


It is a simple matter to verify that the M.L.E. is 

m = (.694, 3.611, 4.694) . Now consider the related contingency table 


22 1 

Z 2 

z 2 

2Z 3 


2 

3 

3 

10 


and the model for the mean 
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ln(m*) £ * 


where M 


span 


l\ll\ll\lV 


mimii 

manifold. This model has a closed-form M.L.E., namely. 


> , the "independence" 


m* = 


5><5/lS 

5*13/18 

5*13/18 

13*13/18 


1.389 

3.611 

3.611 

9.389 


Now note that 

A 

m* = 


In other words it is possible to fit the "difficult" model, M , by 
transforming the table and fitting the "easy" model, ^ * , to the 
transformed table. In the-process of doing this transformation we 
have also recognized that the original log-affine model actually had 
closed-form estimates, namely 


2m x 

A 

m 2 

®2 

— 

ro 

3 > 
to 


m l = ^ 2z l + Z 2^ / * ^ Z 1 + z 2 + z 3^ 

m 2 = + z 2^ 2z 3 + Z 9^ / (4 x + z 2 + z 3^ 

m 3 = ^ 2z 3 + z 2^ ^ ^ * ( z i + z 2 + z 3^ 


This example is clearly contrived to please Dr. Pangloss. We shall 
later present a more realistic version with similar consequences. 


B 


.ly 
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In the preceding example we transformed the data into a form where 
it was much easier to compute the M.L.E. of the vector of expected 
values. Of course we have yet to prove that the above manipulation is 
any more than a numerical coincidence; such proofs are the subject of 
this paper. 

The idea of modifying a problem so that it is amenable to analysis 

by existing or easier methods is not at all new. An old example of 

this phenomenon is the method of filling in missing values to transform 

an "unbalanced" analysis of variance into a "balanced" problem. Although 

fitting an ANOVA model to an incomplete data array is conceptually easy, 

the calculations are much simpler when the missing values are filled in. 

The same is true of Example 2. Fitting the model M is not difficult but 
* 

the model M is much simpler. 

For such a small problem as Example 2 there is little practical 
advantage to be gained from the transformation technique. The motivation 
for this research lies in some very large problems considered by Fienberg 
and Wasserman (1981). We discuss their examples and some related theory 
in section 5. 

Thus far we have not given any motivation for the data transformation 
of Example 2. We now continue the example and give a heuristic justifi¬ 
cation of the method and at the same time present a more realistic version 


of this problem. 
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Example 2 (continued) 

Let us consider a general log-affine model for the Poisson data, z, 
with mean value, m, namely 

ln(m) £ ln(d) + M 


where d is any fixed triple of positive numbers and M is as before. 
Note that if d is the vector of all ones then this reduces to a 
simple log-linear model. Regardless of d , a version of the suffi¬ 
cient statistics for this model are 


and 


2Z 1 + Z 2 


z 2 + 2z 3 


Now consider the table z* as a transformation, g , of z , i.e. 
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* it it it 

We now note that 2 ,, * z,. » v, and z,, = z,. * v, . In other 

1+ +1 1 Z+ +2 l 

words the sufficient statistics for the data z with model M are 

represented twice in the margins of z*. Thus if we fit the row and 

it 

column margins model, M , to z* we might expect the the likelihood equation 
for model M is also satisfied. This turns out to be the case, but we have 
ignored the question of whether m satisfies the log-affine model. We shall 
see that if we fit the log-affine model 

(3.1.1) In (a*) £ ln(g(d)) + .Vi* 


to the data z* then the M.L.E., m, can be recovered. The simple IPFP, 
with starting table g(d), will converge to the M.L.E. 


I 


In section 4 we discuss what conditions are necessary to justify 
procedures such as those discussed above. 


4. A Transformation Theorem 

We present a collection of conditions (graniloquently labelled as 
a theorem) relating to how one may transform estimation problems. First 
we consider a very weak condition which will be used in the theorem and 
which is itself sometimes useful. 

The idea of this first result is that it is often possible to 
fortuitously solve a difficult estimation problem by "accidentally" 
satisfying the conditions. Consider the problem 

maximize f(m|z) 

subject to m e V 

where V is some constraint space. Assume f has a unique maximum over V 
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and denote the maximizing n by m. Nov consider the problem 

maximize f(m|z) 
subject to m e 0 T 

where P ■ Denote the maximizing m by m^ . It is a trivial 

observation that _i£ m* £ V then m^ = m . In other words, if the maximizing 
value, m , under the weaker conditions, V , happens to satisfy the stronger 
conditions, V, then m' is also the maximizer under the stronger conditions. 
Notice also that we did not require m‘ to be unique as the uniqueness of 
m implies there is at most one m 1 in V. This idea could be used anywhere 
a constrained maximum is required but there is no guarantee that in’ will 

be in V. We will use this general idea in frequency data circumstances 

*»*T *** 

where we can prove that m will be in V and where the constraints V are 

easier to deal with than the constraints V. 

We now turn to a more refined version of this method. The statement 

of the result is in terms of the Kullback-Leibler distance but could 

equally be stated in terms of the (dual) likelihood function. 

Theorem 

Let g be a one to one mapping of the p.m.f.'s on a set I into the 
p.m.f.'s on a set I . If E is a linear set of p.m.f.'s on I, then define 
g(E) * (g(p):peE} • Let E be a linear set of p.m.f.'s on I such that 
g(E)C E . If g is such that 

(4.1) I(p||q) - k • I(g(p) !| gCq) ) for p,q e E , 

(g(r)) £ g(E)» then 

» E (r) - g " 1 0P £ * (g(r))) B 


and if 
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The condition (4.1) could be generalized to allow I(p||q) - 
f(I(g(p)|| g(q))) where f is any monotone one to one mapping. We have 
no need for such generality here. 

The theorem shows that under certain conditions it is possible to 
calculate an I-projection in a transformed table and then invert the 
transformation to obtain the I-projection in the original setting. 

Verifying the conditions of the theorem may itself be a difficult task. 
There are at least two ways of using the theorem. In some situations it 
may be possible to define the linear set E so that g(E) * E . This 
is the easier case and it essentially just relabels the problem. However 
even such simple relabeling can be helpful in interpreting the model or 
recognizing, say, a model in the transformed space for which closed form 
estimates are known to exist. The second application of the theorem 
requires more work to verify the conditions, but is also more generally 

if 

applicable. Here we take a linear set E which is much larger than g(E), 

but we then need to prove that P£*(g(r)) e g(E). In other words, even 

though E contains g(E) we need to show that for any g(r), the I-projection 
£ 

onto E is always an element of g(E). For a particular set of data it may 
be easy to verify this condition. All we need do is fit the transformed 
model and see if the I-projection is in g(E). To prove this type of 
result for a general class of problems is more difficult. We will 
illustrate the simple case of the theorem with the following examples. 
Section 5 will be devoted to a discussion of a set of examples where 
g(E) C E* . 


I 
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Example 3 

This example is a continuation of Example 1. The problem concerns 
a 3 x 3 table where the classifying variables have a natural ordering. 
The specific model we consider fits row and column margins and linearly- 
weighted row and column margins. 

We have previously shown that the row and column constraints can 
be considered in pairs and each of the pairs of constraints can be 
individually fit. Thus if ( w i* w 2 » w 3 ) are the current fitted values for, 
say, the first row, we need to adjust this triple so that its row and 
ordered row margins match some specified constants. 

V 

Let Eg be the set of positive triples which satisfy the row and 
ordered row constraints for the first row, i.e., 

f i i D 

Eg - •(positive triples, q : 2q 2 + q 2 = 2a R - a 0R = a 3 

“d 1 2 + ^3 * “Jr 2 a 4 

Now consider the function 


W 1 

1 

— V 

2 2 

1 

2 W 2 

w 

3 


and define 


* 

E 


- g(E s > 

- ( 2 x 2 tables 



such that a + 


b 


a + c 


1 

2 a 3 


and 



d + c = d 4- b 
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_ 7S 

Note that the constraints on E imply that b equals c which means that 


“1 A 

g is well defined on E . It is not a difficult calculation to verify 


that I(qjjw) * I(g(q)||g (w)). Our theorem now allows us to calculate 


-1 


SV (w) as g Pr*(g(w)) 
C S C 


The constraints which define E are just simple row and column 
margins. Thus the I-projection, TP^CgCw)) , can be calculated by the 
usual IPFP (i.e., adjusting row and column margins), or, as it is a 
2 x 2 table, by direct calculation. As the logarithms of the starting 
values, w , do not necessarily satisfy the model, the IPFP will in 
general require several iterations to converge. Thus to obtain the 


▼ 'H 



I-projection, IP (q ) , where E_ is the space of P.D.'s which 
t R n R 

satisfy all of the row constraints, we could transform each row of the 
3 x 3 table into a 2 * 2 table, calculate with the 2 * 2 table and then 
use g 1 to return a triple of fitted values. The approach for the 
columns would be similar. 

There is another g , which transforms the entire 3 * 3 table into 
a 2 * 2 * 2 x 2 table. In this case E* = g( E ) becomes the model of 
no fourth order interaction for the 2^ table. Specifically, 
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It is not difficult to check that the model of no fourth order interac¬ 
tion corresponds to g(E) and that I(p|]q) = I(g(p) || g (p )) • Therefore 
the usual IPFP, with starting values g(e) and the model of no fourth 
order interaction applied to g(q n ) will yield a 2 table of fitted 
values which can in turn be transformed (by g l ) into a 3 x 3 table for 
the original problem. £ 

Example 4 . Paired Comparison Models. 

Davidson and Beaver (1977) have considered a generalization of the 
Bradley-Terry model for paired comparisons which allows for ties and 
order effects. Fienberg (1979) demonstrated that the models of David¬ 
son and Beaver were loglinear models and showed how the generalized 
iterative scaling method of Darroch and Ratcliff (1972) can be used for 
these models. We show how the simple IPFP can also be used to do the 
estimation. 

Consider the K x K * 3 contingency table z = with mean, 

m = {m } . The loglinear model corresponding to the Davidson-Beaver 
Ijk. 

model is (see Fienberg (1979)), 


ln(m^ji) * M + a ij + + » 


and 


ln(m. j2 ) = y + V, + p 2 + 5. , 


ln(m. j3 ) = U + “ij + S 3 + 2 (S i + 6 j ) * 


for which the sufficient statistics are 

{z ij + } > > and + z +i2 + | (z i +3 + Z +i3 } } 
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Thus the 

likelihood equations are 



(4.2) 

A 

m. = z 

ij+ ij + 

ij - 1,2,,. 

., K 

(4.3) 

/s 

m +-fk = Z -H-k 

k - 1,2,3 


and 



(4.4) 

*1+1 + £ +i2 + I (fi i+3 + “+13 > 



Z i+1 + Z 4-i2 + 2 (Z i+3 + 

Z+i3 ) i = 1,2,... 

, K . 

Fienberg 

(1979, p. 481) writes out the 

Darroch and Ratcliff 

algorithm 

for this problem. 



Ue 

transform z into the K x K. x 

4 table z* where 


(4.5) 

* 

z. ... = 2 x z. ... 

ijl ijl 



(4.6) 

* 

z.■ 2 x z 

ij2 ij2 



(4.7) 

Z ij3 = Z ij 3 



(4.8) 

z ij4 z ij 3 

i*j = 1,2 . 

K , 

with transformed likelihood equations 



(4.9) 

k 

“ij+ " Z ij + 

i ,i = 1,2, .... 

K 

(4.10) 

m ++k ’ z ++k 

k = 1,2,3 


(4.11) 

A* /S* /V* /S* 

m i+l + m +i2 + m i+3 + “+13 

* -k rt * 

a 2 , , 4“ Z .a 4* z . » 4* Z , , 
14-1 4-i2 14-3 +1-* 


i » 1,2, ..., K 
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(4.12) 


* it <■> * s'* a * 

m i+l + m +i2 + m i+4 + m +i4 


* * * * 
z i+l + Z +i2 + Z i+4 + z +i4 


i = 1,2,..., K 


As the likelihood equations involve simple sums of cell counts, the 


basic IPFP may be used for this problem. To invert the transformation 

a * s* 


(4.5) 

- (4 

.8) 

it is necessary that 

m,., = ra..^ . Equations 
xj4 ij 3 

(4.9) 

and 

(4.12) 

ensure this. Thus 

the M.L.E. in is 

(4.13) 


A 

m. .. 
ijl 

= — rn 

2 “ijl 



(4.14) 


A 

m. 

xj2 

q 

2 m ij2 



(4.15) 


A 

“i j 3 

A* 

m ij 3 

m, . , 
ij4 


To make 

the 

argument rigorous 

it is necessary to show that if 

satisfy 

(4.9) - 

(4.12) then 


(i) 


A* 

m ij3 

S'* 

“ m. 

ij4 



and 






(ii) 


A* 

m. 

ljk 

defined by 

( 3.13) 

- (3.15) satisfy 



(3.2) 

- ( 3.4) 




Condition (i) has already been mentioned and condition (ii) is easily 
verified by substitution. 

This example has again been a case where the transformed table and 

model are in one to one correspondence with the original table and 

model. The transformed model can be fitted using the simple IPFP but 

* 

as the sufficient statistics are not only margins of z , many standard 

I 


computer packages would have difficulty with this problem. 
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5. Social Networks 


In recent years there has been an increasing interest in models 
for the analysis of data from social networks. A line of research 
described by Holland and Leinhardt (1981) and further developed by 
Fienberg and Wasserman (1981) and Fienberg, Meyer and Wasserman (1981) 
has been particularly fruitful. 

The basic data for these models consists of observations on the 

t* 

arcs of a directed graph (digraph) on g nodes. The nodes, often 
taken to represent individuals or organizations in a community, are 
called actors. The directed arcs linking the actors represent such 
notions as the attitudes of an individual toward another or the flows 
of resources between organizations. 

A social network with a single relationship connecting actors can 
be described by an adjacency matrix. 



1 if actor i connects to actor j (i j) 
0 otherwise 


Holland and Leinhardt (1981) develop a model, which they refer to as 
p^ , and several submodels for such digraph data. Fienberg and 
Wasserman (1981) extend these models to the case where the actors form 
disjoint groups and interest lies in the flows between groups. 
Fienberg, Meyer and Wasserman (1981) further extend these results to 
the situation where more than one relationship is observed between the 
actors or groups. 

From a computational point of view all of these models are 
similar. For each of them the likelihood function can be viewed 
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as the Poisson likelihood and the models are either loglinear models or 
affine transformations of loglinear models for the mean-value parameter. 
There is a further similarity in that for each case a natural presenta¬ 
tion of the data involves non-rectangular data arrays but there exist 
transformations of the data into rectangular structures for which the 
transformed sufficient statistics are simple margins. We will consider 
the simple version of the problem, involving a single relationship 
between actors and the most general version, involving multiple rela¬ 
tions between groups of actors. For these cases we will prove that 
the simple 1PFP can be applied to the transformed data in order to fit 
the desired models using the method of maximum likelihood. 

In order to develop these results we need to consider the original 
data and distributions. Our presentation will emphasize the mathema¬ 
tical structure, ignoring the interpretation of, and motivation for, 
the models. We turn first to a development of the Holland and 
Leinhardt p^ distribution. 

We consider the matrix X = , i = j = 1,2.g} as a random 

matrix to which the distribution will apply. Consider the dyads, 
or subgraphs, D , between actors i and j , where 

% - «ij • V • 

The random variable has 4 possible values, 

= (1,1) : Mutual 

■ (1,0) or (0,1) : Asymmetry 

D,, = (0,0) : Null 
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Under the assumption of dyadic independence, Holland and Leinhardt 
(1981) propose the use of the exponential family of distributions, 


P(X - x) 


exp 


Z 

i<J 


ij ji 


+ 9 


+ Z a, 

i 1 


*i+ 


+ Z 6 
j 


1 



K(p,8,{a i },{B j }) . 


Now consider the random variable Y , equivalent to X , which is 
defined, for i < j, as 


Y.“ X.. • X.. : Mutual 
ijll 13 Ji 

Y ijl0 = X ij * ^ " X ji^ : As y rameCric 
Y ij01 = (1 " X ij 5 * X ji : As y nmetric 

Y 1J oo - - V (1 - V ■ “ u11 

corresponding to the values of . Fienberg and Wasserman (1981) 

show that in terms of Y , the log likelihood function for the model 
P x is: 


4(p,e,{a i M8 ;J }|y) 


“ P J J. y ijii + 9 1 + y ijoi + 2 W 


E °i[ 
i L j 


L <y ijio + y ljll ) + E<y hioi + y hiU ) ] 

>i n< i 


+ 2 8 j[ ,L (y iJio + y an ) + 4 i <y jhoi + y jhii } ] 


j 


j<fc 
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Now view y as an element of V = {p.m.f;'s oa the index set K } where 
K = {(i,j,k,£); i < j = 1,2,..., g; k,£ = 0,1} . If we consider y to 
be distributed as a collection of independent Poisson random variables 
with mean q e V then the likelihood is exactly that which would be 
obtained by using the loglinear model 

K 

ln(q) e M c R . 


The manifold, M 


is spanned by the vectors 


f S e* K 


9 


6=1,2,..., 2 + 2g 

(5.1) P f 1 


given by 

r 1 : (k,£) - (1,1) 
0 : otherwise 


(5.2) 0 f 2 


' 2 : (k,£) - (1,1) 

■ 1 ; (k, £) =» (1,0) or (0,1) 
0 : otherwise 


(5.3) 



1 : (k,£) = (1,1) or (k, £) = (1,0) and 
j > i', i ■ i' for (k,£) = (1,0) and 

j - 1\ i < i 


i’ 


1 , 2 , 


• 9 8 


0. : otherwise 


(5.4) f 


2+g+j' 


j' * 1,2,.., g 



(k,£) = 1,1) or (k,£) = (1,0) and j = j’, 
i > j’ or (k,£) - (0,1) and i = j’, j < i 

otherwise 


This spanning set was chosen so that the inner product of an observed 
y with the f’s yields the sufficient statistics: 


( 5.5) p a 1 - E I y 

j i<j 3 
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(5.6) 8 


(5.7) a., 

I _ 


2+i' 


J 1 J j (y iji° + y iiOl + 

+ y i'j01 ) + h ^., (y hi’01 + y hi'll > 


i' = 1,2,.., g 


(5.8) Bj, a 


2+g+j' 


i' = 1,2,.., g 


^.,^'10 + "ij'^ + h ^, (y j'h01 + y ;i*hll ) 


We now collect the spanning vectors into F = (f : h = 1,2,..., 
(2+2g)} and the observed sufficient statistics into 
A * (a* 1 ; h = 1,2,..., (2+2g) } . If we define the linear space of 
P.D.'s, E, by Che constraints, F , and corresponding constants, A , 
then the M.L.E. is 


q = IPf(r) 


where re/ and r^ ^ * c V e K • Thus a natural, setting 

for the estimation of p^ is as a loglinear model on V . As the 
2 

vector f is not a zero-one vector, and cannot be cast in this form. 


the basic IPFP can not be used for the estimation problem. In addition 
for many problems g will be so large that Newton's method can not be used. 
It would be desirable if the problem could be put in a form where a 
standard algorithm could be used. 


The space V is a rather convoluted construction. It would be 

if £ 

more natural to work with V « (p.m.f.'s on the index set K } where 
K* ■ {(i, j,k,l) : i, j = 1,2,..., g ; k,l = 0,1} , the space of 
g x g x 2 * 2 tables. To this end consider the transformation 
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ajg 

g : V •+ V with y* = g(y) defined by 


y jUk 


y ijU 

0 


i < j 
i = j 


In other words we have transformed the problem into a g x g x 2 x 2 
contingency table with zeros on the "diagonals". The sufficient 
statistics (5.5) - (5.8) appear (sometimes more than once) as the 

[12], [13], [14], [23], [24], and [34] margins of y*. Now consider 
the linear space of p.m.r. s, t , defined by 

F* = {[12], [13], [14], [23], [24], and [34] margin functions} 


and 

A = {[12], [13], [14], [23], [24], and [34] margins of y } . 

We should note that E* is not equal to g(E). In fact, 

g( E) E D iy i j kJl 5 “ y jilk^ ' 

In other words g(Ej is a strict subset of E . As the model, E , 

requires just simple margins of a rectangular data array, the basic 

IPFP found in many computer packages can be used. We would like to be 

* * 

able to fit just E to y , ignoring the symmetry constraints. 

Let 

q* = ff E *(g<*)) 


where 


* 

r ijk£ 


g(r) 


c i 4 i 
0 i ■ j 


As q is easy to calculate we would like to assert that q e g(E) . 






r* 

One method of proceeding would be to go ahead and fit t to y 
If q has the desired symmetries then all is well. In general we 
need to prove that for an arbitrary y , Q must be in g( fc ) • 

Our first version of this proof relied upon the actual calcula¬ 
tions involved in the IPFP to show the symmetry. The proof presented 
here is much simpler and relies only on an invariance argument. 

Let h denote the mapping from R S X S*2 X 2 into R® 8 X 2 X 2 defined 


by 


h 


Z ijki z jiHk ’ 


* 

i.e.,the symmetry transformation. In order that q be in g( E ) we 
require that 


h(lP E *(g(t'>)) - IP^CgCr)) • 

Now notice that 


h([12] margin function) * [12] margin function , 

h([13] margin function) = [24] margin function , 

* 

and that each of the other margin functions in r is mapped into 
another margin function in t . Similarly 

h([13] margin for data y*) = [24] margin for data y* . 

In other words, h(F *) = F* and h( A*) ■ A* which together imply 
that h(E*) = £ . Also note that h(g(r)) = g(r). We can then 


assert that 
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q* ” P £ *(g(r)) * ff h( * } (h(g(r))) = h(q*) 

and hence the result. 

^ x x 

We have now shown that the M.L.E. q resulting from fitting E 

A A — l 

to y is in g(E ) and hence q * g (q ) . 

There are numerous submodels of considered by Holland and 

Leinhardt (1981) and Fienberg and Wasserman (1981). These models, 
represented in terms of parameters and margins in the y* table 
are listed in Table 5.1. 


Table 5.1 Submodels of 


Special Case 

Parameters 

Marg 

ins Fitted 

(i) 

P»0»{a i M8.j} 

[12] 

[23] 

[13] 

[24] 

[14] 

[34] 

(ii) 

e.tejMBj} 

[12] 

[23] 

[13] 

[24] 

[14] 

(iii) 


[12] 

[34] 

[13] 

[24] 

(iv) 

Mc^} 

[12] 

[13] 

[24] 

(v) 

p.e.ta } 

[12] 

[34] 

[14] 

[23] 

(vi) 

9,{3.} 

[12] 

[14] 

[23] 

(vii) 

p,e 

[12] 

[34] 


(viii) 

0 

[12] 

[3] 

[4] 


Each of these sets of margins are invariant under h and the above 
argument is applicable. 
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For the problem all of the models in Table 4.1 can be fit 

using the basic IFF? or. the data y". 

Our second example concerns a class of loglinear models for 
multivariate directed graphs as described in Fienberg, Meyer and 
Wasserman (1981). They consider a set cf data concerning the inter¬ 
relationships between 73 organizations in a small community. Three 
types of relationships were observed for each of the pairs of organi¬ 
zations, but for simplicity we restrict our attention to two of these 
criteria, support and money. For each criterion the organizations were 
asked to respond to the questions: 

(i) to which organizations do you give support (money)? 

(ii) from which organizations do you receive support (money)? 

A particular directed relationship (i.e., giving or receiving) is 
regarded to be present if either or both the organizations in a pair 
perceived the relationship. For each pair of organizations it is 
possible to construct a four-vector of zeros and ones indicating the 
presence or absence of (support out, support in, money out, money in). 
Consider for the moment just the support relationship. A pair of 
organizations are said to have a Mutual relationship if they support 
each other (i.e., (support out, support in) = (1,1)) , a Null relation¬ 
ship if neither supports the other (i.e., (0,0)) , or an Asymmetric 
relationship if support is unreciprocated (i.e., (0,1) or (1,0)) • 

If we aggregate over all ^= 2628 pairs of organizations there 
are ten distinguishable support-money relationships, namely, 
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MM 

with four vector 

(1,1,1,1) 



MA 


(1,1,0,1) 

or 

(1,1,1,0) 

MN 


(1,1,0,0) 



AM 


(0,1,1,1) 

or 

(1,0,1,1) 

AA 


(0,1,0,1) 

or 

(1,0,1,0) 

AA 


(0,1,1,0) 

or 

(1,0,0,1) 

AN 


(0,1,0,0) 

or 

(1,0,0,0) 

NM 


(0,0,1,1) 



NA 


(0,0,1,0) 

or 

(0,0,0,1) 

NN 


(0,0,0,0) 




Notice that when both relationships are asymmetric there are two 
different cases, corresponding to whether the relationships flow in 
the same or in different ways. We denote the table of observed 
probabilities by 2 where for example z^ is the number of mutual- 
mutual relationships divided by . The table is represented by 


MONEY 


z 


S 

U 

P 

P 

0 

R 

T 



M 

A 

N 

M 

Z MM 

Z MA 

Z MN 

A 

Z AM 

Z AA 

Z AN 


Z AA 



N 

Z NM 

Z NA 

Z NN 


Fienberg, Meyer and Wasserman (1981) model the probability, 
q * {q^ ; a,b * M,A,N} that a randomly selected dyad will be assigned 


I 











to a certain cell. They consider linear models for 


€ * 5 a.b = M,A,N} where 

logCq^^) if a,b each equal M or N 

iog(q ab /2) if either a or b equals A . 

These models are affine translations of loglinear modeis for q , The 

arguments presented here apply to all of their models. 

The model we consider takes as a linear space, E , of p.m.f.'s the 

sec of tables, s , which have margins s and s . , n.b = M A N 

a+ -t-b > > > 

which are the same as the corresponding margins for the z-taole. For 
example we require 



S A+ ’ 8 AM + S AA + S aI + S AM ' z AM + Z AA + z aS + Z AS * z 
In order to have the model be linear in £, we need 

q = IP E (r) 

where 

1 if a,b each equal M or N 

r ‘ b ' h <f 

, 2 if either a or b equal A 


As the model space can be spanned by vectors consisting of 0's and l's, 
the simple IPFP, which takes an initial table, r, and successively 
adjusts the row and column "margins" to match those in the observed 
table, can be used. This algorithm is easy to do by hand, but because 
the z-table is not rectangular (i.e., it has 10 cells rather than the 
9 one would expect), and consequently has an extended interpretation of 





margin totals, many standard IPFP computer programs would not be able to 
analyze this table. Moreover, for many of the models considered by 
Fienberg, Meyer and Wasserman the models are not so simple and the 
computations on the z-table require more than the simple 
IPFP ._ For this reason we prefer to work with a transformed problem, 
where the sufficient statistics for the models can be represented by 
simple marginal totals. 

An alternate, though somewhat deceptive, description of the data 
is to consider four-vectors for each of the ^ x 2 ordered pairs of 
organizations and to aggregate this into a 2^ table, y =* y^^ » 
i,j,k ,l ■ 1,2 , where a 1 indicates the presence of a flow and a 2 
indicates the absence of a flow. Thus is number of mutual- 

mutual relationships divided by 5256. Ihe y table duplicates certain 
relationships and gives double weight to certain others. The y-table 
has the form. 


money out 


supp out 

1 


money in 

supp in 

1 

2 

1 

2 

1 

y llll 

y lll2 

y 1121 

y ll22 

2 

• 

• 

• 

• 

1 

« 

• 

• 

• 

2 

• 

• 

i 

• 

• 


We now consider the transformation which maps the z-table into the 
y-table; viz.. 
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2z mm 

Z MA 

Z MA 

2z,_, 

MN 

Z AM 

Z AA 

Z AA 

Z AN 





2 AM 

Z AA 

Z AA 

Z AM 

2z nm 

Z NA 

Z NA 

2z nn 


We denote the factors support (out, in), money (out, in) by the 

numbers 1, 2, 3, and 4. It is now easy to see that the marginal sums 

considered for the z-table can all be found (twice) in the [12] and 

[34] margins of the y-table. Also note that the y-table has a strong 

symmetry, y^^ = YjiZk ^ * Now ^(E ) is just the set of tables 

which have (i) the correct [12] and [34] margins and (ii) preserve the 

observed symmetry in the y-table. Consider just the first of these 

conditions ignoring the symmetry constraint. It is this model which 

* 

we shall consider to be E . As we have relaxed some conditions it is 

* 

clear that g(E ) c E . 

From here on the argument proceeds in the same manner as in the 

single relationship case. It is convenient now to explicitly define 

the space E and the conditions we need to verify to show that 

IP *(g(r)) is in g(E). Consider 

E 

* 

F ** fg} where 
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0 

0 

0 

0 

0 

0 

0 

0 

0 

_ 

0 

0 

0 


1 

1 

1 


0 

0 

0 

1 

0 

0 

L° 

1 

o 

0 

o 

1 

0 

0 

0 

1 


and constants A =* {a^, 

a 2 = and a^ • a^ . 

* * 
defined by F and A . 


.ag} where a^ =* <f^.,g(z)> . Note that 
* 

We define c to be the space of P.D.’s 
Now consider the symmetry transformation: 


h : y ljkt - y jUk • 

For Pg A (g(r)) to be in g(E ) we require 
h(lP^*(g(r)) * E£*(g(r)) . 

It is possible to assert this because the space E is invariant 

under h. Specifically h(f i > = for i = 1,4,5,8 and h(f 2 > «* fg , 

h(fg) ■ , h(f^) * fg and h(fg) B • Because a 2 ■ a^ and 

ag « the linear space h(E*) generated by h(F*) and h(A*) is 
* 

the same as E • We also note that h(g(r)) * g(r) , because of the 
nature of g function. That is the starting values necessarily satisfy 
the symmetry constraints. Now let 
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q* = ff^(g(r)) and 

q* = IF h( ^ ) (h(g(r))) = ff ( 5 (r)) . 

/N ^ 

But note that q* = h(q*) as all we have done is relabel the 
co-ordinates. Thus 

q* - q* - h(q*) 

i.e., the fitted P.D. is (i) invariant under h and (ii) is in £ *. 
Thus q* is in g(E) and g (q*) is the fitted P.D. in the space 
of Z-tables. 

For any of the other models considered by Fienberg, Meyer and 

* 

Wasserman, it is easy to show that the space, c , is invariant under 
h and thus the above argument still works. 

In these examples, g(r) is the uniform distribution; thus the 
IPFP with starting value all ones is an appropriate algorithm. For 
some of the models, the appropriate margins of the y*-table represent 
a decomposable model; in fact the model [12], [34] is itself decom¬ 
posable. Thus we have not only found an easy computational procedure, 
but have also discovered closed-form estimates for some of the models. 
The existence and nature of closed-form estimates varies with the 
number of relationships between actors which are modeled. 

The analysis of the multiple relationship data that we have considered 
has been for the data aggregated over all the actors. In some 
situations it may be desirable to aggregate over only groups of actors, 
in which case there is a 2 4 (or with 3 relationships, 2^) table for 
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each group of actors. In this manner it is possib’e for the number of 
entries in the table, and the number of parameters in the corresponding 
models, to grow very large. Under these circumstances the transforma¬ 
tion techniques outlines in this chapter prove to be of considerable 
practical use. 

6. Desiderata 

We conclude this chapter with a few questions and cautions. The 
examples have shown situations where, for reasons of computational ease, 
it was desirable to transform a contingency table into a related but 
larger table. In the transformed table it was possible to fit a model 
using the standard IPFP whereas in the original table the corresponding 
model would have required a more complicated algorithm. This approach 
of using transformed tables is especially important in practice as 
versions of the standard IPFP are widely available and easy to use. An 
additional bonus which can sometimes be found in the transformed table 
is the existence of closed form maximum likelihood estimates. The theory 
about when closed form estimates exist in complete tables with factorial 
models is well known and such situations are easily recognized. On the 
contrary, when a table is incomplete or has a more complicated structure, 
very little is known about the existence of closed form estimates. Our 
techniques have merely scratched the surface of the more general question 
of closed form estimates. A more general theory of closed form estimates 
for arbitrary loglinear models would seem desirable; perhaps investigations 
of the more general IPFP will aid in this. 
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Throughout our discussion we hove ignored the important questions 

of degree of freedom calculations and asymptotic covariance estimates 

A 

for the M.L.E. When g(E ) = E » that is we are essentially only 

relabeling the problem, then any d.f. and covariances calculated in E 

_ & 

can be transformed back to E . When g(E ) '— £ , special care must 

be taken to calculate the appropriate d.f. in E . We know of no 

VC 

exact procedure for transforming covariance estimates in E back to 
E and suspect that it is not possible. 
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